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DNA METHYLATION PATTERNS 

Field of the Invention 

The present invention relates to genomics and in particular to a method for the rapid 
5 assessment of the genome wide distribution of DNA methylation using restriction endonucleases 
which are sensitive to methylation within their recognition site in combination with size 
fractionation of the restricted DNA and hybridization to a DNA chip. 

Background 

10 DNA methylation is a ubiquitous biological process that occurs in diverse organisms 

ranging from bacteria to humans. During this process, DNA methyltransferases catalyze 
primarily the post-replicative addition of a methyl group to the N6 position of adenine or the C5 
position of cytosine. In higher eukaryotes, DNA methylation plays a central role in epigenetic 
regulation of gene expression and in particular in transcriptional gene silencing, genomic 

1 5 imprinting and embryonic development. Aberrations in DNA methylation have been implicated 
in aging and 

various diseases including cancer. 

A method for determining the methylation state within a genomic sequence context based 
on the use of restriction endonucleases which require the recognition sequence to be 

20 unmethylated to allow cleavage at the site has been described by Bird et al, J. Mol Biol 118, 27- 
47, 1978. The fragments resulting from cleavage of unmethylated recognition sites are detected 
by gel electrophoresis, transferred to a membrane and hybridized to a labeled probe 
corresponding to the DNA fragment to be examined. The resulting hybridization pattern reflects 
the methylation pattern of the DNA. The sensitivity of this method was increased in a variant 

25 combined with PCR (Shemer, R. et al., Proc. Natl Acad. Sci USA, 93, 6371-6376, 1996). 

Amplification by two primers located on both sides of the recognition sequence only occurs after 
cleavage if the recognition sequence is in the methylated form. With both variants only 
methylation of individual positions is examined. 

A new technology, called DNA microarray technology, is attracting more and more 

30 interest among biologists. This technology provides means to monitor almost whole genomes on 
a single chip so that researchers can analyze thousands of genes simultaneously. Terminologies 
that have been used in the literature to describe this technology include, but are not limited to: 
biochip, DNA chip, DNA microarray, probe array and gene array. Affymetrix, Inc. owns a 
registered trademark, GeneChip®, which refers to its high density, oligonucleotide-based DNA 
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arrays. In scientific literature, however, the term "gene chip(s)" is frequently used as a general 
terminology that refers to the DNA microarray technology. 

An array is an orderly arrangement of DNA samples. It provides a medium for matching 
known and unknown DNA samples based on base-pairing rules and automating the process of 
identifying the unknowns. An array experiment can make use of common assay systems such as 
microplates or standard blotting membranes, and can be created by hand or make use of robotics 
to deposit the sample. In general, arrays are described as macroarrays or microarrays, the 
difference being the size of the sample spots. Macroarrays contain sample spot sizes of about 
300 microns or larger and can be easily imaged by existing gel and blot scanners. The sample 
spot sizes in microarray are typically less than 200 microns in diameter and these arrays usually 
contain thousands of spots. DNA microarray, or DNA chips are fabricated by robotics, generally 
on glass but sometimes on nylon substrates, for which probes with known identity are used to 
determine complementary binding, thus allowing massively parallel gene 
expression and gene discovery studies. An experiment with a single DNA chip can provide 
researchers information on thousands of genes simultaneously. With regard to the terminology 
of the hybridization partners it should be noted that in connection with microarrays a "probe" is 
frequently considered to be a tethered nucleic acid with known sequence, whereas a "target" is a 
free nucleic acid sample whose identity or abundance is analyzed. 

There are two major application forms for the microarray technology: (1) identification 
of a nucleotide sequence such as a gene or gene mutation and (2) determination of the expression 
level or abundance of a nucleotide sequence. There are also two variants of the DNA microarray 
technology, in terms of the property of arrayed DNA sequence with known identity: In variant • 
1, probe DNAs of 500-5,000 bases length such as cDNA are immobilized to a solid surface such 
as glass using robotic spotting and exposed to a set of targets either separately or in a mixture. In 
variant 2, an array of oligonucleotides (20-80-mer oligos) or peptide nucleic acid (PNA) probes 
are synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip 
immobilization. The array is exposed to labeled sample DNA, hybridized, and the identity or 
abundance of complementary sequences is determined. This method, "historically" called DNA 
chips, is sold under the GeneChip® trademark. In contrast to cDNA spotting methods in which a 
single clone is used for the analysis, GeneChip® arrays use multiple probes (e.g. 16 probe pairs 
for one gene in the case of the Arabidopsis chip) to interrogate a chromosomal region. This 
probe pairing strategy helps to identify and minimize the effects of non-specific hybridization 
and background signals. A GeneChip® expression array can contain probes corresponding to a 
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number of reference and control genes. Using reference standards, it is possible to normalize 
data from different experiments and compare multiple experiments on a quantitative level 

Some commercially available GeneChip probe arrays are manufactured using technology 
that combines photolithographic methods and' combinatorial chemistry. Tens to hundreds of 
5 thousands of different oligonucleotide probes are synthesized on each array. Each probe type is 
located in a specific area on the probe array called a probe cell. Each probe cell contains millions 
of copies of a given probe. Probe arrays are manufactured in a series of cycles. A glass substrate 
is coated with linkers containing photo labile protecting groups. Then, a mask is applied that 
exposes selected portions of the probe array to ultraviolet light. Illumination removes the photo 

1 0 labile protecting groups enabling selective nucleoside phosphoramidite addition only at the 
previously exposed sites. Next, a different mask is applied and the cycle of illumination and 
chemical coupling is performed again. By repeating this cycle, a specific set of oligonucleotide 
probes is synthesized, with each probe type in a known location. The completed probe arrays are 
packaged into cartridges. Many companies are manufacturing oligonucleotide-based chips using 

1 5 alternative in situ synthesis or depositioning technologies. 

The present invention combines the use of methylation-sensitive restriction enzymes, 
DNA size fractionation and DNA microarray technology in a way, which makes it feasible to 
examine the level and the distribution of DNA methylation on a genome wide scale. It also 
allows resolution down to a gene, gene fragment or any chosen DNA sequence, and can be 

20 subjected to quantification. Whereas all types of microarrays can be used in the context of the 
present invention, it is an important aspect that the probe arrays are hybridized with a selected 
size fraction of labeled genomic target DNA that has been restricted with a methylation sensitive 
endonuclease. Hybridization intensities of different sources of genomic target DNA are then 
compared to each other and optionally to a control digestion with a methylation insensitive 

25 endonuclease to identify sequences showing different levels of methylation. The intensities are 
indicative of the composition of the particular size fraction and therefore of the methylation 
status of the genomic targets. Thus, a comparatively high intensity of hybridization of a probe 
with a low molecular weight target fraction resulting from cleavage with an endonuclease 
inhibited by the presence of 5-methylcytosine or N6-methyladenine in the recognition sequence, 

30 reflects hypomethylation in the target fraction relative to other samples or controls. Similarly, a 
comparatively high intensity of hybridization of a probe with a low molecular weight target 
resulting from digestion with an endonuclease requiring the presence of 5-methylcytosine or N6- 
methyladenine in the recognition sequence, reflects hypermethylation in the target fraction 
relative to other samples or controls. 
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Summary 

The present invention teaches a method to detect differences of genomic methylation 
comprising 

(a) separately cleaving different samples of genomic DNA with a sequence specific 
endonuclease whose cleavage activity is inhibited by or requires the presence of 5- 
methylcytosine or N6-methyladenine in the recognition sequence; 

(b) labelling a defined size fraction of the resulting DNA fragments; 

(c) separately hybridizing the labelled DNA fractions of step (b) to an array of DNA 
molecules representing a plurality of genomic DNA targets; and 

(d) quantifying the differences of the hybridization intensity patterns obtained in step (c). 
Additionally, as a hybridization control, the method may optionally comprise 

(i) cleaving identical samples of genomic DNA with a sequence specific endonuclease 
whose cleavage activity is indifferent to the presence of 5-methylcytosine or N6-methyladenine 
in the recognition sequence and which is preferably an isoschizomere cleaving at the same 
recognition site as the methylation-sensitive enzyme; 

(ii) labelling the same size fraction of the resulting DNA fragments as in step (b); and 
(hi) separately hybridizing the labelled DNA fractions of step (ii) to an identical array of : 

DNA molecules representing a plurality , of genomic DNA targets. 

Detailed Description 

The method is particularly useful to distinguish different cell types on the basis of their 
methylation pattern, which can be extremely useful in the context of cancer diagnosis and 
treatment. The method, however, is not restricted to the analysis of mammalian genome 
methylation, but can also be used for methylation pattern analysis in plants, animals, insects, 
fungi or microbes. Thus, it can be used in plants to compare methylation patterns in the context 
of heterosis. It is preferred, that the DNA samples to be compared are of isogenic origin such as 
healthy versus, tumour tissue of the same organism, parental DNA versus progeny or sibling 
DNA, or DNA of isogenic organisms only differing by one or more specific mutations. In 
general, with increasing genetic distance of the samples to be compared, interpretation of the 
results becomes more difficult. 

A number of different endonucleases inhibited by the presence of 5-methylcytosine or 
N6-methyladenine can be used in the context of the present invention. Their respective 
recognition sequences are usually defined by 4 to 8 base pairs, shorter recognition sequences of 4 
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to 6 base pairs being preferred. A number of particularly useful endonucleases are listed in 
Table 1 . Endonucleases which are useful at sites of overlapping CG are shown in Table 2. 
Particularly preferred examples of recognition sequences are ACGT, GCGC, CCGG, TCGA 
CGCG. 



Table 1: 



Enzyme 


Site 


Enzyme 


Site 


Enzyme 


Site 


Atall 


GACGTC 


BstUI 


CGCG 


Narl 


GGCGCC 


Acil 


CCGC 


Clal 


ATCGAT 


Neil 


CCSGG 


AcII 


AACGTT 


EagI 


CGGCCG 


NgoMI 


GCCGGC 


Age I 


ACCGGT 


Fnu4H I 


GCNGC 


Not I 


GCGGCCGC 


Asc I 


GGCGCGCC 


Fsel 


GGCCGGCC 


Nrul 


TCGCGA 


Ava I 


CYCGRG 


Fspl 


TGCGCA 


Pmll 


CACGTG 


BsaAI 


YACGTR 


HaeO 


RGCGCY 


Pvul 


CGATCG 


BsaHI 


GRCGYC 


Hgal 


GACGC 


Rsrll 


CGGWCCG 


BsiEI 


CGRYCG 


Hhal 


GCGC 


Sac II 


CCGCGG 


BsiWI 


CGTACG 


HinPl I 


GCGC 


Sail 


GTCGAC 


BspD 


ATCGAT 


HpaH 


CCGG 


Smal 


CCCGGG 


BsrFI 


RCCGGY 


KasI 


GGCGCC 


SnaBI 


TACGTA 


BssH II 


GCGCGC 


Mlul 


ACGCGT 


Xhol 


CTCGAG 


BstBI 


TTCGAA 


Nael 


GCCGGC 












Table 2: 






Enzyme 


Site 


Enzyme 


Site 


Enzyme 


Site 


Acc I 


GTMKAC 


BsmAI 


GTCTC 


Nlie I 


GCTAGC 


Acc65 1 


GGTACC 


BspEI 


TCCGGA 


Rsal 


GTAC 


Apa I 


GGGCCC 


BsrB I" 


GAGCGG 


PaeR7 1 


CTCGAG 


ApaL I 


GTGCAC 


Dram 


CACNNNGTG PshA I 


GACNNNNGTC 


Ava II 


GGWCC 


DrdI 


GACN6GTC 


Sail 


GTCGAC 


Ban I 


GGYRCC 


Eael 


YGGCCR 


Sau3A I 


GATC 


Bsal 


GGTCTC 


Earl 


CTCTTC 


Sau96 I 


GGNCC 


BsaB I 


GATN4ATC 


Hinc II 


GTYRAC 


ScrFI 


CCNGG 


Bsgl 


GTGCAG 


Hinfl 


GANTC 


Sfil 


GGCC(N5)GGCC 


BslI 


CCN7GG 


Hpal 


GTTAAC 
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The type of DNA label is only important in the sense that it needs to be compatible with 
the hybridization technology required for the probes immobilized on the solid array supports. In 
general, any method which can label DNA quantitatively can be used for this application, 

5 including but not limited to random oligonucleotide priming , nick translation, chemical labelling 
of DNA such as labelling with Biotin, and light activated chemical labelling of DNA such as 
labelling with Psoralen-biotin and activation by UV light Means to perform these methods are 
commercially available. It is also possible to label the target DNA on the chip by primer 
extension using the chip oligo as the primers. A preferred support is the Affymetrix GeneChip 

10 which can be hybridized to the target DNA according to the manufacturer's instructions. 

The preferred size fraction to be labled primarily depends on the length of the recognition 
sequence cleaved by the endonuclease. Thus, for a recognition sequence of 4 base pairs the 
preferred fragment size is up to 3000 base pairs and preferentially between 100 and 2000 base 
pairs. For a 6 base pair recognition sequence a fragment size of between 100 and 6000 base 

15 pairs is suitable. 

The specific probes of the DNA array may represent any type of genomic DNA, that is 
coding sequences such as cDNA or non-coding sequences such as promoters, enhancers, 
terminators, introns, transposons etc. In the context of methylation studies it is preferred that the 
specific probe arrays represent non-coding sequences such as regulatory sequences of the 

20 genome of an organism. 

Experimental data resulting from the hybridization can be analyzed using computer 
software. Affymetrix, for example, offers a program called GeneChip Microarray Suite. This 
program, for comparison between two chips, measures and normalizes the 'baseline chip 1 
intensity values to the average signal intensity. Intensity values of the 'experimental chip* are 

25 then compared to the baseline chip and a 'difference change 1 is calculated. The output of the 
software provides a qualitative call: 'increase 1 , 'marginal increase', 'no change', 'decrease', and 
'marginal decrease' as well as discrete numerical metrics used to make the call. Expression 
elements may be ranked based on absolute level of expression or relative change in expression 
between a two chip comparison. All numerical data may be exported to a Microsoft Excel 

30 spreadsheet for further analysis. Expression elements are identified by a GenBank accession 
number and the GeneChip analysis software allows for immediate hyperlink to the GenBank 
entry for full sequence annotation. 
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Examples 

Example 1 - Preparation, labelling and hybridization of genomic target DNA 

Genomic DNA of Arabidopsis thaliana mutant som8 plants (O Mittelsten Scheid et al 
(1998), Proc. Natl Acad. Set USA 95, 632-637) and corresponding wild type plants is isolated 
by standard biochemical procedures ensuring high molecular weight and sufficient purity for 
enzymatic modifications. The DNA is subjected to endonucleolytic cleavage with the sequence 
specific endonuclease HpaTL, the activity of which is blocked by the presence of 5-methyl 
cytosine in the recognition sequence. As a control, the same DNA sample is digested with the 
endonuclease MspL After endonucleolytic cleavage and agarose gel electrophoresis, various size 
fractions are eluted from the gel and labelled using the Life Technologies DNA Labelling system 
with some modifications and scaled up to 200pl: 

1 . 1 0|xl of a DNA size fraction eluted from the gel, generally containing between 
0.5-1.5ng of DNA and preferably 0.8^g of DNA, are mixed with lOjal H2O and 80|al of a 
2.5x Random Primer Solution and placed on ice; 

2. The mixture is then boiled for 5 minutes and placed on ice again immediately; 

3. Thereafter 20^1 of a lOx dNTP mixture including Biotin-14-dCTP, 72^1 H2O and 

8^1 Klenow fragment are added, mixed and briefly (about 10 seconds) centrifuged before 
incubation at 37°C for 4-6 hours; 

4. The reaction is terminated by the addition of 20(il stop buffer 

5. DNA is precipitated by the addition of 22|il 3M NaAc and 440jxl ethanol 
(95-99%) and leaving the DNA on dry ice for 20 minutes or at -20°C overnight; 

6. The DNA is then collected by centrifiigation at 4°C for 10 minutes at maximum 
speed of a table top centrifuge and discarding the supernatant; 

7. The pellet is then redissolved in 200\x\ H2O and reprecipitated with ethanol (steps 5-6); 

8. Finally the DNA pellet is dissolved in 100 jil of H2O for use in the hybridization 
procedure. 

The commercially available GeneChip® Arabidopsis genome array used in this example 
contains probe sets interrogating more than 8,200 genes and more than 100 EST clusters for 
Arabidopsis thaliana. Eighty percent of the genes represented on the array are predicted coding 
sequences from genomic BAC entries. Twenty percent are from high quality cDNA sequences. 
The array also contains more than 100 EST clusters, sharing homology with the predicted coding 
sequences from BAC clones. 
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The labelled target DNA of example 1 is hybridized to the Arabidopsis Genome Array 
and further analyzed as described in example 1 of EP-A-999 285 and corresponding US Patent 
No. 6,203,989. 

Example 2 - Methylation patterns in som8 plants as compared to wild type plants 

Using gene-by-gene analysis it has been shown previously that the DNA fraction, which 
is preferentially demethylated in som8 Arabidopsis mutants, is composed of remnants of 
transposons and of repetitive DNA. The experimental data derived from the hybridization 
studies described in example 2, wherein the methylation status of more than 8000 genes are 
studied in a single experiment, are in agreement with the previous results and additionally 
provide a direct, unbiased and broader picture of genome wide DNA methylation changes in 
som8 plants as compared to wild type plants. Of the 8000 genes studied 124 can be 
characterized as being related to transposable elements and the experimental data confirm that 
transposable elements are preferentially demethylated in som8 plants as compared to the control 
wild type plants. The methylation level decrease of the transposons correlates well with their 
transcriptional reactivation in many independent examples. However, a subset of transposons, 
although demethylated, remains transcriptionally inactive. In addition, it is found that selected 
genes and members of multigene families are also subjected to demethylation and transcriptional 
reactivation in som8 plants similar to the subset of transposons. Among this group of genes 
those encoding pathogen resistance determinants are the most prominent examples. 
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What is claimed is: 

1 . A method to detect differences of genomic methylation comprising, 

(a) separately cleaving different samples of genomic DNA with a sequence 

specific endonuclease whose cleavage activity is inhibited by or requires the presence of 
5-methylcytosine or N6-methyladenine in the recognition sequence; 

(b) labelling a defined size fraction of the resulting DNA fragments; 

(c) separately hybridizing the labelled DNA fractions of step (b) to an array of 
' DNA molecules representing a plurality of genomic DNA targets; and 

(d) quantifying the differences of the hybridization intensity patterns obtained in 
step (c). 

2. The method of claim 1 additionally comprising, 

(i) cleaving identical samples of genomic DNA with a sequence specific 
endonuclease whose cleavage activity is indifferent to the presence of 5-methylcytosine 
or N6-methyladenine in the recognition sequence; 

(ii) labelling the same size fraction of the resulting DNA fragments as in (b); and 
(hi) separately hybridizing the labelled DNA fractions of (ii) as a control to an 

identical array of DNA molecules representing a plurality of genomic DNA targets. 

3. The method of claim 2, wherein the endonuclease inhibited by the presence of 5- 
methylcytosine or N6-methyladenine and the endonuclease indifferent to the presence of 5- 
methylcytosine or N6-methyladenine cleave at the same recognition sequence. 

4. The method of claim 1, wherein the recognition sequence of the endonuclease has a 
length of 4 base pairs . 

5. The method of claim 4, wherein the recognition sequence of the endonuclease is ACGT, 
GCGC, CCGG, TCGA or CGCG. 

6. The method of claim 1, wherein labelling of the DNA fragments is performed by random 
DNA priming. 
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